Why ggplot2?
The basic idea: independently specify plot building blocks and combine them to create just about any kind of graphical display you want.
Building blocks of a graph include:
Compared to base graphics, ggplot2
Aesthetics are things that you can see. Examples include:
Aesthetic mappings are set with the aes() function.
Geometric objects are the actual marks we put on a plot. Examples include:
geom_point)geom_line)geom_boxplot)A plot must have at least one geom; there is no upper limit. You can add a geom to a plot using the + operator
We will use data from the NCAA basketball tournament from 2011 - 2016.
hoops <- read_csv('http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/TourneyDetailedResults.csv')
hoops_2011 <- hoops %>% filter(Season >= 2011)
hoops_2011## # A tibble: 402 x 34
## Season Daynum Wteam Wscore Lteam Lscore Wloc Numot Wfgm Wfga Wfgm3 Wfga3
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2011 134 1155 70 1412 52 N 0 26 50 4 13
## 2 2011 134 1421 81 1114 77 N 1 27 54 4 12
## 3 2011 135 1427 70 1106 61 N 0 23 54 4 16
## 4 2011 135 1433 59 1425 46 N 0 20 59 9 24
## 5 2011 136 1139 60 1330 58 N 0 22 54 7 26
## 6 2011 136 1140 74 1459 66 N 0 24 61 6 22
## 7 2011 136 1153 78 1281 63 N 0 29 54 4 11
## 8 2011 136 1163 81 1137 52 N 0 32 66 9 24
## 9 2011 136 1196 79 1364 51 N 0 29 53 8 23
## 10 2011 136 1211 86 1385 71 N 0 28 52 9 15
## # … with 392 more rows, and 22 more variables: Wftm <dbl>, Wfta <dbl>,
## # Wor <dbl>, Wdr <dbl>, Wast <dbl>, Wto <dbl>, Wstl <dbl>, Wblk <dbl>,
## # Wpf <dbl>, Lfgm <dbl>, Lfga <dbl>, Lfgm3 <dbl>, Lfga3 <dbl>, Lftm <dbl>,
## # Lfta <dbl>, Lor <dbl>, Ldr <dbl>, Last <dbl>, Lto <dbl>, Lstl <dbl>,
## # Lblk <dbl>, Lpf <dbl>
geom_point()geom_smooth()geom_rug()geom_density2d()graph.a + geom_point() +
geom_smooth(method = 'loess', formula = 'y ~ x') +
geom_rug() + geom_density2d()geom_jitter()labs()graph.a + geom_rug() + geom_density2d() +
geom_jitter() +
labs(x='Losing Team Field Goals Made',
y = 'Winning Team Field Goals Made')xlim() and ylim()graph.a + geom_rug() + geom_density2d() +
geom_jitter() +
labs(x='Losing Team Field Goals Made',
y = 'Winning Team Field Goals Made') +
xlim(c(0,max(hoops_2011$Wfgm))) + ylim(c(0,max(hoops_2011$Wfgm)))There are a wide range of themes available in ggplot: theme overview
Use the Seattle Housing Data Set http://math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv to create an interesting graphic, include informative titles, labels, and add an annotation.
seattle_in <- read_csv('http://math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv')## Parsed with column specification:
## cols(
## price = col_double(),
## bedrooms = col_double(),
## bathrooms = col_double(),
## sqft_living = col_double(),
## sqft_lot = col_double(),
## floors = col_double(),
## waterfront = col_double(),
## sqft_above = col_double(),
## sqft_basement = col_double(),
## zipcode = col_double(),
## lat = col_double(),
## long = col_double(),
## yr_sold = col_double(),
## mn_sold = col_double()
## )
Now use ggplot2 to create an interesting graph using the Seattle Housing data set.
## `geom_smooth()` using formula 'y ~ x'
seattle_in$zipcode <- as.factor(seattle_in$zipcode)
graph.a <- ggplot(data = seattle_in, aes(sqft_living,price))
graph.a + geom_jitter(aes(col = zipcode))+
theme(plot.title = element_text(size=8),
text = element_text(size=6)) + geom_smooth(method='loess')+
ggtitle('Seattle Housing Sales: Price vs. Square Footage Living Space') +
ylab('Sales Price (million dollars)') +
xlab('Living Space (square foot)')+
scale_y_continuous(breaks=c(seq(0,7000000,by=1000000)),
labels=as.character(0:7)) +
annotate('text',3500,6000000,
label = 'Housing price depends on zipcode', size=2) +
annotate("rect", xmin = 0, xmax = 7250, ymin = 5500000, ymax = 6500000, alpha = .6) +
geom_segment(aes(x=3500, xend=3500, y=5500000, yend=3000000),
arrow = arrow(length = unit(0.5, "cm")))